منابع مشابه
Cross-language plagiarism detection
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (i) a comprehensive retrieval process for cros...
متن کاملCross-Language Plagiarism Detection Methods
The present paper provides a summary on the existing approaches to plagiarism detection in multilingual context. Our aim is to organize the available data for the further research. Considering distant language pairs is of a particular interest for us. Cross-language plagiarism detection issue has acquired pronounced importance lately, since semantic contents of a document can be easily and disc...
متن کاملBabelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?
In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We p...
متن کاملUsing Word Embedding for Cross-Language Plagiarism Detection
This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an o...
متن کاملDeep Investigation of Cross-Language Plagiarism Detection Methods
This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to dra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Language Resources and Evaluation
سال: 2010
ISSN: 1574-020X,1574-0218
DOI: 10.1007/s10579-009-9114-z